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Many studies have shed light on rater criteria for assessing the performance of 
language skills (e.g., Eckes, 2005). However, the interface between rater assess¬ 
ment and interlanguage pragmatics (IIP) has remained largely unnoticed. To 
address this interface, this study explored the ratings native (NES) and nonnative 
English speaking (NNES) teachers assigned to second language (L2) refusal pro¬ 
duction and the criteria they applied in their ratings. To this end, 50 NES and 50 
NNES teachers participated in rating L2 refusal production ofEFL learners that 
included responses to a 6-item written discourse completion task. The data were 
analyzed qualitatively and quantitatively. Qualitative analysis showed that na¬ 
tive teachers applied 11 criteria and nonnative teachers applied 6 criteria in their 
pragmatic ratings. Reasoning/explanation was the leading criterion in teacher 
assessment among native raters, whereas politeness was the main criterion for 
nonnative ratings. Quantitative analysis documented variation in the frequency 
of drawing on rating criteria and significant differences in ratings, with NNES 
teachers being more lenient and divergent in their ratings. The results suggest 
there is a gap between NES and NNES teachers in terms of rating criteria, strictness, 
and convergence in rating. 

Plusieurs etudes ontfait la lumiere sur les criteres employes dans Vevaluation de 
competences linguistiques (par ex. Eckes, 2005). Toutefois, peu de recherche a porte 
sur Vinterface entre les evaluations et la pragmatique de Vinterlangue. Pour abor- 
der cette interface, Vetude a porte sur revaluation par des enseignants anglophones 
et des enseignants dont Panglais n'etait pas la langue maternelle de la produc¬ 
tion de refus en langue seconde, et de leurs criteres d'evaluation. A cette fin, 50 
enseignants anglophones et 50 enseignants dont la langue maternelle n'etait pas 
Panglais ont evalue la production de refus en E2 d'apprenants en anglais langue 
etrangere, y compris leurs reponses ecrites a une tdche de completement a six items. 
Les donnees ont etesoumises a des analyses quantitatives et qualitatives. L'analyse 
qualitative a indique que, dans leurs evaluations, les enseignants locuteurs natifs 
d'anglais ont applique 11 criteres et les enseignants dont la langue maternelle n'est 
pas Panglais, 6 criteres. he critere d'evaluation principal des anglophones etait le 
raisonnement/Pexplication alors que pour les enseignants non natifs, c'etait la poli- 
tesse. L'analyse quantitative a revele une variation dans la frequence de recours aux 
criteres d'evaluation et des differences significatives dans les evaluations, cedes des 
enseignants non-natifs faisant preuve de moins de severite et de plus de divergence. 
Les resultats indiquent un ecart entre les criteres, la severite et Phomogeneite dans 
les evaluations des enseignants anglophones et enseignants non-anglophones. 
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Introduction 


One component of pragmatic competence is to know how to perform a par¬ 
ticular speech act. Among speech acts, refusal is highly complicated, primarily 
because it often involves lengthy negotiations and face-saving manoeuvres 
to accommodate the noncompliant nature of the speech act. Since refusal 
normally functions as a second pair part, it precludes extensive planning on 
the part of the refuser. 

Against this backdrop, the study of rating speech acts such as refusal is 
salient on two grounds. First, refusal is a commonly used speech act in the 
process of communication and hence is a constituent of many pragmatic as¬ 
sessment tasks. Furthermore, as the speech act of refusal is realized differently 
across cultures and communicative situations, nonnative teachers should be¬ 
come familiar with native criteria for rating refusal production, particularly 
in the outer circle (Kachru, 1997), where there are no established local Eng¬ 
lish norms for pragmatic appropriateness. Despite such saliency in the outer 
circle context, little research has been conducted to date on the criteria used 
by normative English speakers (NNESs) in rating refusal production as mea¬ 
sured against the native English speaker (NES) baseline sociopragmatic and 
pragmalinguistic norms for rating the appropriateness of speech act produc¬ 
tion. This is particularly important in a foreign language context or in the ex¬ 
panding circle where there is no local variety of English and hence nonnative 
speakers are "norm dependent," that is, dependent on native speaker norms 
in their rating (Kachru, 1992). Accordingly, this study aimed to investigate 
the pragmatic rating of second language (L2) refusal production by norma¬ 
tive teachers as measured against native English-speaking teachers' ratings. 

Rating of Learner Productions 

In performance assessment, our judgments are affected by our perceptual vantage 
points. The effects of rater perceptions introduce highly subjective factors that 
make ratings more or less inaccurate. Rater bias is a major problem when lan¬ 
guage raters judge learners' performance using criteria that are vague or highly 
subjective. Thus, if they use such rating criteria, it is likely that inconsistency and 
inaccuracy come into play. In fact, assessment of learners' performance is a 
complex process with many ramifications. Knoch, Read, and von Randow 
(2007) point out that raters' judgments are prone to various sources of bias 
and error that can ultimately undermine the quality of the ratings. 

A number of studies using different psychometric methods have iden¬ 
tified various rater effects (e.g., Myford & Wolfe, 2003, 2004) that need to 
be addressed if an acceptable level of reliability is to be maintained. Rater 
effects can be summarized as (a) the severity effect, (b) the halo effect, (c) 
the central tendency effect, (d) inconsistency, and (e) the bias effect (Myford 
& Wolfe, 2003). Studies focusing on language performance assessments, as 
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reviewed by Eckes (2005), showed a significant range of rater effects. These 
studies, in particular, identified differences in raters' severity or leniency 
(e.g., Engelhard, 1994; Engelhard & Myford, 2003; Lumley & McNamara, 
1995). These differences were found to be resistant to rater training (Bar¬ 
rett, 2001; Lumley & McNamara, 1995; Weigle, 1998) and to persist in rat¬ 
ers for a long time (Fitzpatrick, Ercikan, Yen, & Ferrara, 1998). Furthermore, 
researchers identified significant effects for rater-ratee interaction (Kondo- 
Brown, 2002; Lynch & McNamara, 1998), rater-task type interaction (Lynch 
& McNamara, 1998; Wigglesworth, 1993), and rater-criteria interaction 
(Wiggles worth, 1993). 

Rater effects need more attention, as they are sources of systematic vari¬ 
ance in observed ratings associated with raters rather than ratees (Cronbach, 
1995; Hoyt, 2000; Myford & Wolfe, 2003). As a result, rater effects that are 
irrelevant to the construct being rated threaten the validity of the assessment 
procedure (Bachman, 2004; Messick, 1989, 1995; Weir, 2005). Two rater ef¬ 
fects related to the main theme of this study are severity and inconsistency. 
The former occurs when raters are found to rate either too harshly or too 
leniently, as compared with other raters or established baseline ratings. The 
latter is exhibited when raters tend to rate in terms of different criteria or the 
inconsistent application of criteria. For example, they might favour a certain 
group of test takers or mainly apply one criterion at the expense of others. The 
variability of ratings as a result of these effects has been addressed in stud¬ 
ies on speaking and writing (e.g., Schaefer, 2008; Shi, 2001). One source of 
rater variability is the status of the rater as a native or normative speaker. It is 
very important to determine whether native English- speaking and norma¬ 
tive English-speaking raters use the same criteria for rating tasks. However, 
the results of studies comparing NES and NNES who rated oral and written 
language performance vary. Barnwell (1989) found that NESs were harsher 
in their evaluations than NNESs, whereas others, such as Fayer and Krasinski 
(1987), found that NNES raters were more severe. For instance, Fayer and 
Krasinski investigated Puerto Rican learners of English speech act produc¬ 
tion and gave their samples to two groups of raters: NES and Puerto Rican 
speakers. Their results revealed that NNES raters were harsher, especially 
with respect to pronunciation errors, than NES raters. 

Although the literature is replete with references to native speaker assess¬ 
ment of speaking and writing performance, it seems that only two studies on 
rater variability are related to pragmatic rating (Taguchi, 2011; Youn, 2007 ). 
Taguchi studied native speakers' ratings of two types of speech acts produced 
by EFL learners. The data revealed similarities and differences in the raters' use 
of pragmatic norms and social rules in evaluating the appropriateness of speech 
acts. Focusing on Korean as a foreign language, Youn's study showed different 
degrees of severity in native Korean raters' ratings of speech act performance. 
However, there is no mention of native raters' criteria compared with normative 
raters' on pragmatic assessment. As a result, this issue is still underexplored. 
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Refusal: Nature and Strategies 

Refusal functions as a response to an initiating act and is considered to 
be a speech act in which "a speaker fails to engage in an action proposed 
by the interlocutor" (Chen, Ye, & Zhang, 1995, p. 121). Refusal is a face- 
threatening act because it contradicts the listener's wants. The negotiation 
of refusal entails frequent attempts at directness or indirectness and also 
other degrees of politeness appropriate to the situation (Eslami, 2010). In 
addition, refusal behaviours vary across cultures, and pragmatic transfer 
occurs as learners rely on their "deeply held native values to carry out 
complicated and face-threatening speech acts like refusals" (Beebe, Taka- 
hashi, & Uliss-Weltz, 1990, p. 68). Hence, a proper understanding and 
production of refusal and, in turn, its rating require a certain amount of 
culture-specific knowledge. 

As refusal is face-threatening, it usually involves a long negotiated se¬ 
quence, and its form and content vary, depending on situational variables 
such as power, distance, and imposition. Saying "no" to requests, invitations, 
offers, and suggestions is a kind of dispreferred action that is typically com¬ 
plex, mitigated, indirect, and accompanied by prefaces, hesitations, repairs, 
apologies, and accounts (e.g., Levinson, 1983; Pomerantz, 1984). 

Various strategies should be employed to avoid offending one's interlocu¬ 
tors. Takahashi and Beebe (1987) noted that an inability to say "no" politely 
will lead to an offense. Due to the different nature of this speech act, as well as 
some degree of risk-taking involved in refusing, pragmatic knowledge helps 
EFL learners realize appropriate strategies. However, a layer of complexity 
related to cultural issues exists and, in some cases, such as found in Ishihara 
and Tarone's (2009) study, L2 speakers intentionally resist what they perceive 
as native-speaker norms. 

Beebe et al. (1990) categorized refusal into semantic formulas and adjuncts 
appropriate for refusal strategies. This taxonomy includes both direct and in¬ 
direct strategies. In the direct category, two semantic formulas are included. 
They are performative (e.g., I refuse it) and nonperformative statements (e.g., 
I can't). In indirect strategies, there are 11 semantic formulas: statement of 
regret, wish, excuse/reason/explanation, statement of alternative, set condi¬ 
tion for future or past acceptance, promise of future acceptance, statement of 
principle, statement of philosophy, attempt to dissuade interlocutor, accep¬ 
tance that functions as a refusal, and avoidance. 

The Current Study 

This study was aimed at investigating native English speaking raters' and 
normative English speaking raters' criteria for rating the EFL learners' prag¬ 
matic production of refusals. To do so, the following research questions were 
addressed: 
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1. What criteria are used by native and normative English speaking raters in 
rating the speech act of refusal produced by EFL learners? 

2. Is there any significant difference between native and normative English 
speaking raters in rating the speech act of refusal produced by EFL learn¬ 
ers? 

Method 

Participants 

One group of participants included 50 educated native teachers of English from 
the United States, the United Kingdom, Canada, and Australia. The homepage 
data and the background information they provided clearly showed that they 
were NESs from these four countries. They were faculty members teaching ESL at 
different language centres in international universities. The other group consisted 
of 50 NNES teachers. Each had at least three years of teaching experience 
and held an MA degree in applied linguistics. The normative teachers were 
from different language centres in Iran, where English is taught as a foreign 
language. Both groups were asked to participate in this study via e-mail. Both 
groups included male and female teachers. 

Instrument 

A written discourse completion test (WDCT) was used to collect the data in 
this study, as it is a common measure to elicit learners' production of prag¬ 
matics. It was made up of six refusal situations reflecting different degrees 
of formality, power relation, and distance (see Appendix). The situations in¬ 
cluded educational contexts, workplace contexts, and daily-life contexts. In 
terms of power status and familiarity, the situations were marked by equal 
and unequal power relations, as well as familiar and unfamiliar interlocutors. 
Each situation was followed by a response given by an EFL learner. A number 
of EFL learners were asked to provide a response to each situation. Of the 
responses, one was selected by the researchers for each situation to ensure 
that the responses to the six situations varied in their degrees of pragmatic ap¬ 
propriateness. Thus the focus in the selection procedure was placed on prag¬ 
matic failure or appropriateness rather grammatical inaccuracy, as reflected 
in the choice of words unsatisfactory and appropriate in the rating scale. Every 
response was followed by a rating scale ranging from 1 (very unsatisfactory) to 
5 (most appropriate). Below the rating scale for each response, there was a space 
entitled "criteria" so that the raters could write comments on the pragmatics 
criteria they applied to the rating of the response to each situation. 

Data Collection Procedure 

The refusal WDCT was administered in paper format to about 20 EFL stu¬ 
dents. They were studying for a BA program in English literature or transla- 
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tion in an Iranian university, and their LI was Persian. The responses to each 
situation were reviewed by the researchers and one response selected for 
each situation. After this selection, the WDCT was sent electronically to NES 
teachers to rate the appropriateness of responses on a 5-point Likert scale and 
to write the criteria for their rating in comment format. The questionnaire was 
first uploaded to the SurveyMonkey® site, and native ESL teachers in different 
universities in the United States, the United Kingdom, Canada, and Australia 
were asked via e-mail to complete the questionnaire on that site electroni¬ 
cally. Of 800 teachers contacted through e-mails, 50 filled out the question¬ 
naire completely. Of the 106 nonnative teachers contacted, 50 completed the 
rating sheets and returned the WDCT with their rating comments. 

Data Analysis 

The current study investigated the rating of L2 refusal production by NES 
and NNES English teachers. In part, it used the content analysis technique to 
analyze the data. To derive the criteria that both native and normative raters 
considered in rating EFL learners' refusal production, the content of their 
comments about the pragmalinguistic and sociopragmatic appropriateness 
or infelicity of each response was analyzed. The analysis of criteria based 
on the comments consisted two steps. The first was a careful analysis of re¬ 
fusal strategy frameworks based on a modified version of Beebe et al.'s (1990) 
taxonomy. Although the strategies in that framework represented refusal 
production rather than functioning as a rating rubric, they helped to identify 
in the raters' comments criteria related to the (in)appropriateness of refusal 
in terms of the underrepresentation, overrepresentation, or nonrealization of 
certain strategies in response to a situation in the WDCT. The second source 
of insight was Brown and Levinson's (1987) politeness model, in which strate¬ 
gies of positive and negative politeness are depicted. The model contributed 
to the analysis of the criteria relevant to the violation of politeness in refusal 
production reflected in the raters' comments. In the quantitative part of the 
data analysis, frequency counts and /-tests were conducted to measure the 
difference between the refusal ratings of native raters and nonnative raters. 

Results 

Refusal Rating Criteria 

Research Question 1 was concerned with the criteria used by NES and NNES 
teachers in rating the speech act of refusal produced by EFL learners. To 
derive the criteria that both NES and NNES raters used, the content of their 
comments stating the reasons for the pragmatic appropriateness of each re¬ 
sponse was analyzed. This analysis resulted in 11 criteria for rating refusal. The 
criteria, as described below, show that both NES and NNES teachers specified 
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pragmatic, rather than grammatical, features as a source of their rating of refusal 
production. 

(1) Brief apology. This refusal criterion is important as it prepares the interlocu¬ 
tor for an upcoming refusal. Two examples of this criterion derived from NES 
and NNES rating comments are given below. 

NES comment: I would add an apology before refusing the invitation. 

NNES comment: An apology is needed before any refusal. 

(2) Statement of refusal. The second refusal criterion, a statement of re¬ 
fusal, is a head act expressing the refusal and giving a clear idea of rejection to 
an interlocutor. An example of the application of this criterion by NES raters 
is given below. NNES raters did not use this criterion. 

NES comment: A proper refusal should include a statement of refusal 

in terms that are both specific and in a tone appropri¬ 
ate to the social relationship between the one refusing 
and the requester or inviter on certain occasions. 

(3) Offer suitable consolation. This criterion, an offer of suitable consola¬ 
tion, follows the head act to mitigate the refusal. Like the previous criterion, 
NNES raters did not employ this criterion to rate the WDCT. 

NES comment: If I were her, I would offer a suitable consolation and 

say "Could we possibly have lunch some other day?" 

(4) Irrelevancy of refusal. This criterion focused on the irrelevancy of 
a refusal. In some cultures, refusal is so indirect that the addressee cannot 
understand whether it is a refusal or an acceptance of an offer or invitation. 

NES comment: This sounds like an acceptance of the apology, not a 

refusal. 

NNES comment: It is an apology acceptance, not refusal! 

(5) Explanation/Reasoning. The fifth refusal criterion was an explana¬ 
tion that follows the head act to justify the refusal. After refusing an offer, an 
invitation, a suggestion, or a piece of advice, some explanation is needed to 
soften the face-threatening effect. 

NES comment: A bit more effort to explain the reason would be re¬ 

quired here. 

NNES comment: In my opinion, frankly speaking and elaborating on 

the main issue and reason is better than evading the 
issue. 

(6) Cultural problem. Because pragmatic competence is highly dependent 
on culture, cultural misinterpretation occurs in EFL contexts. NNES raters 
did not apply this criterion in their ratings. 
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NES comment: 


This might be something cultural but I could never 
say so. 

(7) Dishonesty. This criterion is sometimes misinterpreted as indirectness 
in refusal and may result in offering false excuses rather than giving reasons. 
Only NES raters referred to this criterion in their comments. 

NES comment: Being more honest with your reasons for not want¬ 

ing to ride together would have been easier for the old 
friend to take. 

(8) Thanking. The eighth criterion was thanking, which is a mitigation 
device to soothe the face-threatening effect and to console the hearer. NNES 
raters gave no comment representing this criterion. 

NES comment: First, you should thank her for the invitation and then 

explain the reason for refusing it. 

(9) Postponing to another time. This criterion reflected the need for miti¬ 
gation so that the face-threatening effect could be softened by postponing the 
offer or request to another time. Both NES and NNES raters used this criteria 
to rate the appropriateness of WDCT responses. 

NES comment: The speaker should say, "Can I take you up on your 

offer some other time ?" 

NNES comment: The speaker could postpone the invitation to [an]other 

time politely. 

(10) Statement of alternative. This criterion was used to evaluate learners' 
success/failure in giving other choices after a refusal, in order to ease the situ¬ 
ation for the hearer. The raters' comments below document the significance 
of this criterion. 

NES comment: You should say "We can arrange something else for 

some other time." 

NNES comment: In order to not hurt your friend, it's better to ask her 

to have copies of your notes instead of lending them. 

(11) Politeness. The last criterion was politeness, the interpretation of 
which varies in different cultures. In fact, its interpretation depends on the 
values of social distance, dominance, and degree of imposition in a given con¬ 
text. The severity of its violation varies cross-culturally. Both NES and NNES 
raters took this criterion into account in rating WDCT responses. 

NES comment: I would not criticize [the] other person without know¬ 

ing more about the circumstances they are in, so I find 
this response a bit rude. 

NNES comment: He should politely reject the suggestion to show the 

respect. 
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Table 1 shows the frequency of NES and NNES raters' criteria for the total WDCT 
and across all six situations. As some NNES teachers failed to provide criteria 
for their pragmatic rating in certain WDCT contexts, the total number of their 
criteria was far less than that of NES teachers. This shows that the former 
group had comparatively lower pragmatic awareness of the rationale behind 
the (in)appropriateness of the refusal produced in a WDCT situation. 

In general, the raters' comments on the refusals across the situations 
manifested many sources of inappropriateness, such as lack of explana¬ 
tion, politeness, cultural problem, postponing to another time, using brief apol¬ 
ogy expression appropriately, offering repair, and thanking. NES and NNES 
raters did not agree in their ratings of most of the refusal cases. More¬ 
over, their criteria were different, due to lack of awareness in terms of 
appropriate refusal on the one hand and English sociocultural norms on 
the other. Results of the study indicate that, to make an accurate assessment 
of students' performance, NES and NNES teachers frequently applied a va¬ 
riety of relatively stable criteria that remained applicable from situation to 
situation. The criteria common across situations were explanations, politeness, 
cultural problems, speech act appropriateness, and offer compensation. 

NES-NNES Refusal Ratings 

Research Question 2 was raised to investigate the difference between NES 
and NNES teachers in rating the speech act of refusal produced by EFL learn¬ 
ers. To address the research question, descriptive statistics were calculated 
and t-test procedures conducted. Table 2 presents descriptive statistics for 
refusal rating by NES and NNES raters. As shown in the table, the overall 
mean refusal rating was 2.59 for NES raters and 3.29 for NNES raters. The 
highest mean for native ratings across situations was 3.32 and the lowest 
was 2.06, while the highest mean for non-native raters was 4.02 and the 
lowest was 2.56. Table 2 shows that NNES raters' ratings for all situations 
were higher than those of NES raters. Furthermore, standard deviations 
of NNES ratings for the total WDCT and all six situations therein were 
found to be greater, showing less convergence in their ratings compared 
with the NES ratings. 

Next, an independent-samples f-test was conducted to compare the 
difference in refusal rating between NES raters and NNES raters (Table 
3). As displayed in the table, there was a significant between-group differ¬ 
ence in total refusal ratings (f = 7.21, df = 98 , p = .000). NES and NNES mani¬ 
fested variation in their ratings across all situations except for Situation 4 
(f = 0.30, df = 98 ,p= .76). As multiple f-tests were applied for the analysis of 
Research Question 2, to avoid Type I error the Bonferroni method was used 
to arrive at adjusted alpha-level. The results showed that the differences 
in five situations (all except Situation 6) remained significant after the 
Bonferroni correction. 
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Table 1 

Frequency of Refusal Criteria among NES and NNES Raters. 
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Note. BA = brief apology; SOR = statement of refusal; OSC = offer suitable consolation; IOR = irrelevancy of refusal; E/R = explanation/reasoning; 
CP = cultural problem; D = dishonesty; T = thanking; PAT = postponing to another time; SOA = statement of alternative; P = politeness. 
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These results, in conjunction with those related to the frequency of crite¬ 
ria, indicate that NES and NNES teachers differed from each other not only 
in their application of rating criteria to evaluation of refusals made in differ¬ 
ent WDCT situations, but also in assigning scores to rate the appropriateness 
of refusals. 

Discussion 

Similar to language assessment in general, pragmatic assessment can be affected 
by three main variables: test task, rater characteristics, and rating criteria. Many 
studies have shed light on the third variable (rating criteria) for assessing the 
performance of language skills (e.g., Eckes, 2005; Gamaroff, 2000). However, 
the interface between rating criteria and pragmatics has remained largely 
unexamined. Hence, this study was conducted in a multiple-raters setting with 
NES and NNES raters to explore the impact of raters on pragmatic assessment 
of the speech act of refusal in terms of rating criteria and rating scores. 

The first objective of this study was to discover how NES and NNES 
teachers rated L2 refusal production and what criteria they applied to the 
evaluation of its appropriateness. With regard to raters' rating criteria, the re¬ 
sults of this study showed that NES and NNES teachers applied certain criteria 
to evaluate the appropriateness of L2 refusal production. Many of these criteria 
are pragmatically general or universal, in that they can be applied to the assess¬ 
ment of other speech acts. Salient instances of such criteria were explanation and 
politeness. The largely homogeneous rating criteria, particularly among NES 
teachers from different nationalities, lend further support to the universality 
of many pragmatic criteria. 

Besides general pragmatics rating criteria, such as politeness and explana¬ 
tion, this study shed light on criteria specific to refusal, including brief apology, 
state of refusal, and offer suitable consolation. The findings indicate that speech act 
rating required an awareness of specific criteria involved in the appropriate¬ 
ness of a particular speech act. Unlike rating language skills, which largely de¬ 
pends on a set of general criteria, pragmatics rating is, to some extent, shaped 
by the nature of a particular speech act and the criteria specifically related to 
it. It follows that both groups of raters drew on two types of criteria in their 
rating: pragmatically general criteria and speech-act-specific criteria. A very 
revealing aspect of this study comes from the finding that most of the refusal¬ 
rating criteria corresponded to the strategies needed to produce refusal. This is 
strong evidence in favor of rating validity. Raters need to use the components 
of a construct and the strategies underlying performance to maximize their 
rating validity. In the case of refusal, such correspondence strengthens the va¬ 
lidity of pragmatic rating. 

The findings from this study also revealed variability in different situations 
among teachers as evidenced by the frequency of criteria reported. The frequency- 
based variability may be a determining factor affecting the rating of pragmatic per- 
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formance. The most frequent criterion mentioned by NES raters was explanation, 
which can be attributed to the nature of the refusal speech act. However, NNES 
raters applied politeness as the main criterion. It seems that NNES raters mostly re¬ 
garded politeness as a general criterion and hence overused it to justify any inappro¬ 
priate production of refusal; as a result, they lost sight of the fact that appropriate 
apology required the provision of specific reasoning to refuse or reject a suggestion, 
an invitation, an offer, or a piece of advice. Variation in the frequency of rating 
criteria reported across situations is also a manifestation of divergence existing in 
evaluating the appropriateness of L2 refusal production in each single situation. For 
instance, in situation 1, all NES raters applied irrelevancy of refusal because the L2 
learner had not produced a refusal; however, NNES raters mostly applied politeness 
and felt sympathy with the interlocutor, a domestic servant, in that WDCT situa¬ 
tion. The finding of this study is in line with that of Taguchi (2011), which revealed 
divergent focus among raters of different nationalities in their use of pragmatic 
norms when evaluating appropriateness of speech acts. 

The second objective of the study was to explore the ratings that NES 
and NNES English-speaking teachers assigned to refusal production. Results 
showed that NNES raters manifested different rating behaviour by consis¬ 
tently overrating refusals across situations and thus being inclined towards 
leniency in rating. This NES-NNES difference in refusal rating can be ex¬ 
plained in terms of variation in their perceptions of such variables as power, 
social status, and preferred refusal strategies by native and nonnative speak¬ 
ers (e.g., Felix-Brasdefer, 2003; Takahashi, 1996; Takahashi & Beebe, 1987). 
Normative speakers' perception of social status, for example, is among the 
factors that influence their estimation of appropriateness of L2 learners' re¬ 
fusal production. This factor was considered in rating refusals in the current 
study. For instance, in Situation 3, native raters commented on "bother" as a 
cultural problem; however, nonnatives commented on the lack of politeness. 
This is in line with Sadler and Eroz's (2001) results that showed that Turkish 
speakers refused less frequently than speakers of other languages, but if they 
did, refusals were definitely followed by an excuse or explanation. Moreover, 
the findings of Al-Issa (2003) indicate that indirect strategies were favoured 
more by the Jordanians than the Americans. In Honglin's (2007) study of 
American and Chinese participants, the results revealed that the Americans 
were more direct than the Chinese in their refusals, but that the Chinese 
considered refusals as face-threatening acts and used politeness strategies 
in their refusals. In essence, the Americans tried to solve the problem, while 
the Chinese tried to restore the relationship between interlocutors. This point 
is true of Situation 4, in which native raters evaluated a refusal in terms of 
its directness, whereas non-native raters were more concerned with polite¬ 
ness and preserving the relationship with the interlocutor. However, despite 
variation in the types of criteria NES and NNES raters employed to measure 
the appropriateness of refusal in Situation 4, the ratings were largely similar. 
This indicates that similarity in ratings does not entail the application of the 
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same criteria. Whereas rating scores are a product-oriented measure of differ¬ 
ence between NES and NNES raters, the analysis of the criteria leading to rat¬ 
ing scores, that is, a process-oriented approach, is necessary for an in-depth 
understanding of raters' rating behaviour. Similarly, in Youn's (2007) study, 
the results revealed that each rater showed unique bias patterns, depending 
on the test type and speech act. 

Conclusions and Implications 

The study revealed the criteria employed for refusal production ratings by 
NES and NNES raters. The findings showed that NES raters applied 11 crite¬ 
ria while assessing L2 refusal production. The criteria common across situa¬ 
tions in refusal for both NES and NNES raters were brief apology, irrelevancy 
of speech act, explanations, postponing to another time, statement of alternative, and 
politeness. Although NES and NNES raters gave different weights to polite¬ 
ness, it was among the most frequently employed criteria in both groups. 
The frequent reference of raters to this criterion is compatible with the general 
perception of politeness as the main measure of pragmatic appropriateness. Em¬ 
phasized in pragmatic literature, politeness seems to be the principle overriding 
the other criteria for pragmatic appropriateness. 

The premise that politeness is considered a pragmatic universal, and hence 
has cross-linguistic and cross-cultural realizations, can contribute to conver¬ 
gence on pragmatic rating. However, in view of the fact that there are variations 
in the perception of both sociocultural norms and pragmalinguishc realizations 
of politeness, the application of the politeness criterion to the rating of speech 
acts showed variability among NES and NNES raters. Mostly, NNES raters 
mentioned politeness as a leading criterion while NES raters highlighted expla¬ 
nation for the speech act of refusal. Moreover, mere mention of the criterion 
of politeness may be misleading because variation arises when it comes to the 
evaluation of the degree of politeness observed in the production of a speech act. 

Generally, NNES raters in this study were more lenient than NES raters, 
which highlights the need for more pragmatically informed ratings by NNES 
teachers. This can be achieved through rater training programs in which the 
focus would be on helping NNES teachers recognize effective criteria for rat¬ 
ing pragmatic production and paving the way for increasing accuracy in their 
ratings. Because rating criteria play a significant role in pragmatic assess¬ 
ment, NNES teachers in EFL contexts such as Iran, where there is insufficient 
pragmatic awareness of such criteria for speech act production in English, 
should be encouraged to participate in training programs that aim to raise 
their pragmatics rating consciousness so that their rating criteria more closely 
approximate those of NES raters. Such a program may include video clips 
demonstrating native speakers' production of refusals, as well as less ap¬ 
propriate refusals performed by nonnative speakers, along with the rating 
scores and rating criteria assigned by native teacher raters. NNES teachers. 
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particularly in an EFL context where there are neither any established local 
English norms nor any variety of world English to function as a frame of 
reference, usually apply different rating criteria to assess the same pragmatic 
production. In fact, raters may have a different understanding of the con¬ 
struct being measured, and such differences may have a direct influence on 
the ratings they assign to test takers' performance in the testing context. As 
evident from the results of this study, the scores that NES and NNES raters 
assigned to students' performance were different, with NNES teachers being 
more lenient than NES raters. This suggests that the two groups applied dif¬ 
ferent rating criteria to rate the same construct; this, in turn, signifies the need 
for rating training. 

NNES teachers should become conscious of rating criteria through training 
programs to increase their accuracy in interlanguage pragmatic (ILP) rating as 
measured against the benchmark. Therefore, rater training should be imple¬ 
mented in teacher education programs to alter the assessment practice of 
teachers, and decision makers need to take training programs into consider¬ 
ation for EFL raters. Furthermore, the significance of the politeness criterion 
has implications for the rating of pragmatics. Provided that this criterion fea¬ 
tures highly in pragmatics rating and consequently affects raters' judgment 
of pragmatic appropriateness, NNES teachers should have sufficient prag- 
malinguistic and sociopragmatic competence underpinning their perception 
of politeness. Although NES raters are comparatively more homogeneous in 
this regard, pragmatics rating by NNES raters, which is most common in an 
EFL context, requires a good understanding not only of the pragmalinguistic 
realization of politeness but also of L2 social norms and conventions, particu¬ 
larly those diverging from LI politeness norms. As for NES raters, it should 
be noted that, had they been from the same national background, they would 
likely have manifested more homogeneous rating behaviour. This should be 
taken into account in the interpretation of the NES data in this study. 
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Appendix 
Refusal Rating 

In the following situations, an English language learner was supposed to 
make refusals. Please read the EFL learner's answer in each situation and 
rate its appropriateness according to the following rating scale. Then provide 
your criteria and reasons for the selection of a particular point (1, 2, 3, 4, or 
5) on the scale. 
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1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate 
4. appropriate 5. most appropriate 

1. You arrive at your office and see that your cleaner is upset. You notice 
that he has bumped into an antique vase while cleaning the table and has 
broken the vase. He apologizes to you and wants to pay for it, but you 
don't accept his apology. What would you say? 

Answer: Hey, accidents happen. Don't worry about that. OK? 

1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate 4. appropriate 

5. most appropriate 

Criteria: 

2. You have just started working in a new company. The first day, you are 
walking in the hall and see one of your old friends from university. You 
come to know that he lives next to you, in the same neighborhood. He 
suggests that everyday you get to work together in your car, but you like 
to get to work alone, so you refuse his suggestion. What would you say? 
Answer: I'd love to but I can't. You are dear to me. I wish I could. 

1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate 4. appropriate 

5. most appropriate 

Criteria: 

3. You meet one of your professors in the hall at university. You like him 
very much and you think he is the best professor at that university. You 
go and greet him. He is very happy to see you and invites you to lunch at 
the university cafeteria. Unfortunately, you have promised your friends 
you would visit them for lunch, so you can't accept his invitation. What 
would you say? 

Answer: I'd love to sir, but I've promised some of my friends to meet them for 
lunch. I hope you don’t mind. Can I bother you some other time? I really don't 
want to pass such a great offer. 

1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate 4. appropriate 

5. most appropriate 

Criteria: 

4. You are a junior in college. You attend classes regularly and take good 
notes. Your classmate often misses class, and asks you for the lecture 
notes. However, you need the notes yourself and can't lend them to her, 
so you refuse her request. What would you say? 

Answer: Actually I need the notes myself. Why don't you try to attend the 
classes regularly? 

1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate 4. appropriate 

5. most appropriate 

Criteria: 
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5. You are going to refuse an invitation offered to you by your colleague to 
an art gallery. What would you say? 

Answer: Sorry, but I can't come. I've some things I should take care of, you 
know. 

1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate 4. appropriate 

5. most appropriate 

Criteria: 

6. You are trying to reject an invitation offered to you by your older sister 
to her house for a dinner party. You are so busy and can't go there. How 
would you decline her invitation? 

Answer: I'm so busy. Excuse me. 

1. very unsatisfactory 2. unsatisfactory 3. somehow appropriate 4. appropriate 

5. most appropriate 

Criteria: 
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